Automatic Generation of Assembly to IR Translators Using Compilers

نویسنده

  • Niranjan Hasabnis
چکیده

Translating low-level machine instructions into higher-level intermediate representation (IR) is one of the central steps in many binary translation, analysis and instrumentation systems. Most of these systems manually build the machine instruction to IR mapping table needed for such a translation. As a result, these systems often suffer from two problems: (a) a great deal of manual effort is required to support new architectures, and (b) even for existing architectures, lack of support for recent instruction set extensions, e.g., Valgrind’s lack of support for AVX, FMA4 and SSE4.1 for x86 processors. To overcome these difficulties, we propose a novel approach based on learning the assembly-to-IR mapping automatically. Modern compilers such as GCC and LLVM embed knowledge about these mappings in their code generators. By leveraging this knowledge, our approach can greatly reduce the implementation effort required for lifting binary code to IR. Moreover, such an approach is architecture-neutral, being able to support numerous architectures for which GCC (or other compilers) already have a backend. While coverage can be a challenge in learning-based approaches, note that in this problem domain, there is virtually an endless supply of training data that can be obtained by translating vast quantities of open-source code using compilers such as GCC and LLVM. We present experimental results that demonstrate the promise of our approach. Already, our implementation can support multiple architectures (x86, ARM and AVR), handle binaries of significant size (openssl and binutils), and be applied to multiple compilers (GCC and LLVM).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Benchmarking Code Generation Methodologies for Programmable Digital Signal Processors

We evaluate rapid prototyping tools and compilers as code generation methodologies for programmable digital signal processors (DSPs). Code generated by compilers and rapid prototyping tools have been reported as significantly less efficient in memory usage and execution time versus assembly language code written by expert programmers. As the complexity of the system increases, however, the scal...

متن کامل

Evaluation of Automatically-Generated Compilers

Compilers or language translators can be generated using a variety of formal specification techniques. Whether generation is worthwhile depends on the effort required to specify the translation task and the quality of the generated compiler. This paper reports the results from a systematic comparison of a hand-coded translator for the Icon programming language with one generated by the Eli comp...

متن کامل

Handling Multi-Versioning in LLVM: Code Tracking and Cloning

Instrumentation by sampling, adaptive computing and dynamic optimization can be efficiently implemented using multiple versions of a code region. Ideally, compilers should automatically handle the generation of such multiple versions. In this work we discuss the problem of multi-versioning in the situation where each version requires a different intermediate representation. We expose the limits...

متن کامل

Source-to-Source Transformations for Efficient SIMD Code Generation

In the last years, there has been much effort in commercial compilers to generate efficient SIMD instructions-based code sequences from conventional sequential programs. However, the small numbers of compilers that can automatically use these instructions achieve in most cases unsatisfactory results. Therefore, the code often has to be written manually in assembly language or using compiler bui...

متن کامل

Improvement of generative adversarial networks for automatic text-to-image generation

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015